Detecting Alu insertions from high-throughput sequencing data

نویسندگان

  • Matei David
  • Harun Mustafa
  • Michael Brudno
چکیده

High-throughput sequencing technologies have allowed for the cataloguing of variation in personal human genomes. In this manuscript, we present alu-detect, a tool that combines read-pair and split-read information to detect novel Alus and their precise breakpoints directly from either whole-genome or whole-exome sequencing data while also identifying insertions directly in the vicinity of existing Alus. To set the parameters of our method, we use simulation of a faux reference, which allows us to compute the precision and recall of various parameter settings using real sequencing data. Applying our method to 100 bp paired Illumina data from seven individuals, including two trios, we detected on average 1519 novel Alus per sample. Based on the faux-reference simulation, we estimate that our method has 97% precision and 85% recall. We identify 808 novel Alus not previously described in other studies. We also demonstrate the use of alu-detect to study the local sequence and global location preferences for novel Alu insertions.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mobile element scanning (ME-Scan) identifies thousands of novel Alu insertions in diverse human populations.

Alu retrotransposons are the most numerous and active mobile elements in humans, causing genetic disease and creating genomic diversity. Mobile element scanning (ME-Scan) enables comprehensive and affordable identification of mobile element insertions (MEI) using targeted high-throughput sequencing of multiplexed MEI junction libraries. In a single experiment, ME-Scan identifies nearly all AluY...

متن کامل

Retrotransposon mobilization in cancer genomes

The Cancer Genome Atlas project was initiated by the National Cancer Institute in order to characterize the genomes of hundreds of tumors of various cancer types. While much effort has been put into detecting somatic genomic variation in these data, somatic structural variation induced by the activity of transposable element insertions has not been reported. Transposable elements (TEs) are part...

متن کامل

Genome-wide LORE1 retrotransposon mutagenesis and high-throughput insertion detection in Lotus japonicus.

Use of insertion mutants facilitates functional analysis of genes, but it has been difficult to identify a suitable mutagen and to establish large populations for reverse genetics in most plant species. The main challenge is developing efficient high-throughput procedures for both mutagenesis and identification of insertion sites. To date, only floral-dip T-DNA transformation of Arabidopsis has...

متن کامل

Discovery and characterization of Alu repeat sequences via precise local read assembly

Alu insertions have contributed to >11% of the human genome and ∼30-35 Alu subfamilies remain actively mobile, yet the characterization of polymorphic Alu insertions from short-read data remains a challenge. We build on existing computational methods to combine Alu detection and de novo assembly of WGS data as a means to reconstruct the full sequence of insertion events from Illumina paired end...

متن کامل

Alu repeat discovery and characterization within human genomes.

Human genomes are now being rapidly sequenced, but not all forms of genetic variation are routinely characterized. In this study, we focus on Alu retrotransposition events and seek to characterize differences in the pattern of mobile insertion between individuals based on the analysis of eight human genomes sequenced using next-generation sequencing. Applying a rapid read-pair analysis algorith...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 41  شماره 

صفحات  -

تاریخ انتشار 2013